Snowflake Stream Ingest with Storage Integration

In today’s fast-paced business environment, organizations need to effectively leverage the value of data to make informed decisions and have an upper hand in the increasing competition. With data coming from IoT devices, mobile devices, or websites, there is compelling need for real-time data processing.

Calibo's Data Pipeline Studio (DPS) supports processing of streaming data using Snowflake stream ingest. If you are using S3 as a data lake and ingesting data into Snowflake, then this is what your pipeline looks like:

Snowflake Stream Ingest pipeline

Amazon S3 (Data Lake) > Snowflake Stream Ingest (Data Integration) > Snowflake (Data Lake)

The data is loaded into a landing layer temporarily and then into the unification layer after the selected operation is performed on it. When you ingest streaming data from an S3 bucket into a Snowflake table, you must select a preconfigured storage integration in Snowflake and ensure that your S3 bucket has access to the selected storage integration.

Snowflake Stream Ingest uses Snowpipe to continuously load data from files as soon as it is available. This way near real-time data can be made available for processing. When you create a Snowflake stream ingest job, you create a task and specify the interval for the task. The task interval is the polling frequency at which the data is loaded from source to target after performing the specified operation in the unification layer.

To create a stream ingest data integration job

 
  1. On the home page of DPS, add the following stages:

    • Data Lake: Amazon S3

    • Data Integration: Snowflake Stream Ingest

    • Data Lake: Snowflake

  2. Configure the Amazon S3 and Snowflake nodes.

  3. Click the data integration node and click Create Job.

  4. For the data integration job creation, provide the following inputs:

  1. Click Complete.

To run the Stream Ingest data integration job

 
  1. Publish the pipeline with the changes.

  2. Notice that the Run Pipeline option is disabled. Click the down arrow key adjacent to it. Enable the toggle switch for Snowflake Stream Ingest 1.

    SF Stream Ingest ebable stream

  3. The stream ingest job goes into Running state. The status of the Snowflake Integration job is now seen as Running.

    SF Stream Ingest job running state

To stop running the Stream Ingest data integration job

 
  1. On the DPS home page, click the down arrow (adjacent to Run Pipeline) and disable the toggle for Snowflake Stream Ingest. The job stops running and the status changes to Terminated.

 

Related Topics Link IconRecommended TopicsWhat's next? Data Ingestion using Amazon Kinesis Data Streams with S3 Data Lake